Why Large Closest String Instances Are Easy to Solve in Practice
نویسندگان
چکیده
We initiate the study of the smoothed complexity of the Closest String problem by proposing a semi-random model of Hamming distance. We restrict interest to the optimization version of the Closest String problem, and give a 2-approximation algorithm, which we refer to as CSP-Greedy, that runs in O(n` + `)-time, where ` is the string length and n is the number of strings. Using smoothed analysis, we prove CSP-Greedy achieves a (1 + 2 )approximation guarantee, where > 0 is a small value. This approximation and runtime guarantee demonstrates that Closest String instances with a relatively large number of input strings are efficiently solved in practice. We also give experimental results demonstrating that CSP-Greedy runs efficiently on instances with a large number of strings. The counter-intuitive fact that large Closest String instances are relatively easy and efficient to solve gives new insight into this well-investigated problem.
منابع مشابه
A Closer Look at the Closest String and Closest Substring Problem
Let S be a set of k strings over an alphabet Σ; each string has a length between ` and n. The Closest Substring Problem (CSSP) is to find a minimal integer d (and a corresponding string t of length `) such that each string s ∈ S has a substring of length ` with Hamming distance at most d to t. We say t is the closest substring to S. For ` = n, this problem is known as the Closest String Problem...
متن کاملCombinatorial and Probabilistic Approaches to Motif Recognition
Short substrings of genomic data that are responsible for biological processes, such as gene expression, are referred to as motifs. Motifs with the same function may not entirely match, due to mutation events at a few of the motif positions. Allowing for non-exact occurrences significantly complicates their discovery. Given a number of DNA strings, the motif recognition problem is the task of d...
متن کاملA Mathematical Model and Grouping Imperialist Competitive Algorithm for Integrated Quay Crane and Yard Truck Scheduling Problem with Non-crossing Constraint
In this research, an integrated approach is presented to simultaneously solve quay crane scheduling and yard truck scheduling problems. A mathematical model was proposed considering the main real-world assumptions such as quay crane non-crossing, precedence constraints and variable berthing times for vessels with the aim of minimizing vessels completion time. Based on the numerical results, thi...
متن کاملAn electromagnetism-like metaheuristic for open-shop problems with no buffer
This paper considers open-shop scheduling with no intermediate buffer to minimize total tardiness. This problem occurs in many production settings, in the plastic molding, chemical, and food processing industries. The paper mathematically formulates the problem by a mixed integer linear program. The problem can be optimally solved by the model. The paper also develops a novel metaheuristic base...
متن کاملThe Bounded Search Tree Algorithm for the Closest String Problem Has Quadratic Smoothed Complexity
Given a set S of n strings, each of length `, and a nonnegative value d, we define a center string as a string of length ` that has Hamming distance at most d from each string in S. The Closest String problem aims to determine whether there exists a center string for a given set of strings S and input parameters n, `, and d. When n is relatively large with respect to ` then the basic majority a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010